Overview

Dataset statistics

Number of variables24
Number of observations29965
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory5.5 MiB
Average record size in memory192.0 B

Variable types

Numeric20
Categorical4

Alerts

Repay_Sept is highly correlated with Repay_Aug and 2 other fieldsHigh correlation
Repay_Aug is highly correlated with Repay_Sept and 7 other fieldsHigh correlation
Repay_July is highly correlated with Repay_Sept and 9 other fieldsHigh correlation
Repay_June is highly correlated with Repay_Sept and 10 other fieldsHigh correlation
Repay_May is highly correlated with Repay_Aug and 8 other fieldsHigh correlation
Repay_Apr is highly correlated with Repay_Aug and 8 other fieldsHigh correlation
Bill_Sept is highly correlated with Repay_Aug and 8 other fieldsHigh correlation
Bill_Aug is highly correlated with Repay_Aug and 10 other fieldsHigh correlation
Bill_July is highly correlated with Repay_Aug and 11 other fieldsHigh correlation
Bill_June is highly correlated with Repay_July and 13 other fieldsHigh correlation
Bill_May is highly correlated with Repay_July and 13 other fieldsHigh correlation
Bill_Apr is highly correlated with Repay_June and 11 other fieldsHigh correlation
Pay_Sept is highly correlated with Bill_Sept and 5 other fieldsHigh correlation
Pay_Aug is highly correlated with Bill_July and 5 other fieldsHigh correlation
Pay_July is highly correlated with Bill_June and 7 other fieldsHigh correlation
Pay_June is highly correlated with Bill_June and 6 other fieldsHigh correlation
Pay_May is highly correlated with Bill_June and 5 other fieldsHigh correlation
Pay_Apr is highly correlated with Bill_May and 4 other fieldsHigh correlation
Repay_Sept is highly correlated with Repay_Aug and 3 other fieldsHigh correlation
Repay_Aug is highly correlated with Repay_Sept and 4 other fieldsHigh correlation
Repay_July is highly correlated with Repay_Sept and 4 other fieldsHigh correlation
Repay_June is highly correlated with Repay_Sept and 4 other fieldsHigh correlation
Repay_May is highly correlated with Repay_Sept and 4 other fieldsHigh correlation
Repay_Apr is highly correlated with Repay_Aug and 3 other fieldsHigh correlation
Bill_Sept is highly correlated with Bill_Aug and 4 other fieldsHigh correlation
Bill_Aug is highly correlated with Bill_Sept and 4 other fieldsHigh correlation
Bill_July is highly correlated with Bill_Sept and 4 other fieldsHigh correlation
Bill_June is highly correlated with Bill_Sept and 4 other fieldsHigh correlation
Bill_May is highly correlated with Bill_Sept and 4 other fieldsHigh correlation
Bill_Apr is highly correlated with Bill_Sept and 4 other fieldsHigh correlation
Repay_Sept is highly correlated with Repay_Aug and 1 other fieldsHigh correlation
Repay_Aug is highly correlated with Repay_Sept and 4 other fieldsHigh correlation
Repay_July is highly correlated with Repay_Sept and 4 other fieldsHigh correlation
Repay_June is highly correlated with Repay_Aug and 3 other fieldsHigh correlation
Repay_May is highly correlated with Repay_Aug and 4 other fieldsHigh correlation
Repay_Apr is highly correlated with Repay_Aug and 5 other fieldsHigh correlation
Bill_Sept is highly correlated with Bill_Aug and 4 other fieldsHigh correlation
Bill_Aug is highly correlated with Bill_Sept and 5 other fieldsHigh correlation
Bill_July is highly correlated with Bill_Sept and 5 other fieldsHigh correlation
Bill_June is highly correlated with Repay_May and 5 other fieldsHigh correlation
Bill_May is highly correlated with Repay_Apr and 6 other fieldsHigh correlation
Bill_Apr is highly correlated with Repay_Apr and 6 other fieldsHigh correlation
Pay_Sept is highly correlated with Bill_AugHigh correlation
Pay_Aug is highly correlated with Bill_JulyHigh correlation
Pay_June is highly correlated with Bill_MayHigh correlation
Pay_May is highly correlated with Bill_AprHigh correlation
LIMIT_BAL is highly correlated with Bill_Sept and 5 other fieldsHigh correlation
MARRIAGE is highly correlated with AGEHigh correlation
AGE is highly correlated with MARRIAGEHigh correlation
Repay_Sept is highly correlated with Repay_Aug and 5 other fieldsHigh correlation
Repay_Aug is highly correlated with Repay_Sept and 4 other fieldsHigh correlation
Repay_July is highly correlated with Repay_Sept and 4 other fieldsHigh correlation
Repay_June is highly correlated with Repay_Sept and 4 other fieldsHigh correlation
Repay_May is highly correlated with Repay_Sept and 5 other fieldsHigh correlation
Repay_Apr is highly correlated with Repay_Sept and 5 other fieldsHigh correlation
Bill_Sept is highly correlated with LIMIT_BAL and 6 other fieldsHigh correlation
Bill_Aug is highly correlated with LIMIT_BAL and 6 other fieldsHigh correlation
Bill_July is highly correlated with Bill_Sept and 6 other fieldsHigh correlation
Bill_June is highly correlated with LIMIT_BAL and 6 other fieldsHigh correlation
Bill_May is highly correlated with LIMIT_BAL and 8 other fieldsHigh correlation
Bill_Apr is highly correlated with LIMIT_BAL and 6 other fieldsHigh correlation
Pay_Sept is highly correlated with Pay_Aug and 2 other fieldsHigh correlation
Pay_Aug is highly correlated with Bill_July and 3 other fieldsHigh correlation
Pay_July is highly correlated with LIMIT_BAL and 8 other fieldsHigh correlation
Pay_June is highly correlated with Pay_Sept and 1 other fieldsHigh correlation
Pay_May is highly correlated with Bill_July and 1 other fieldsHigh correlation
Default is highly correlated with Repay_SeptHigh correlation
Pay_Aug is highly skewed (γ1 = 30.43861292) Skewed
Repay_Sept has 14737 (49.2%) zeros Zeros
Repay_Aug has 15730 (52.5%) zeros Zeros
Repay_July has 15764 (52.6%) zeros Zeros
Repay_June has 16455 (54.9%) zeros Zeros
Repay_May has 16947 (56.6%) zeros Zeros
Repay_Apr has 16286 (54.4%) zeros Zeros
Bill_Sept has 1978 (6.6%) zeros Zeros
Bill_Aug has 2476 (8.3%) zeros Zeros
Bill_July has 2840 (9.5%) zeros Zeros
Bill_June has 3165 (10.6%) zeros Zeros
Bill_May has 3476 (11.6%) zeros Zeros
Bill_Apr has 3990 (13.3%) zeros Zeros
Pay_Sept has 5218 (17.4%) zeros Zeros
Pay_Aug has 5365 (17.9%) zeros Zeros
Pay_July has 5937 (19.8%) zeros Zeros
Pay_June has 6377 (21.3%) zeros Zeros
Pay_May has 6672 (22.3%) zeros Zeros
Pay_Apr has 7142 (23.8%) zeros Zeros

Reproduction

Analysis started2021-10-10 03:16:45.943551
Analysis finished2021-10-10 03:18:12.517223
Duration1 minute and 26.57 seconds
Software versionpandas-profiling v3.1.1
Download configurationconfig.json

Variables

LIMIT_BAL
Real number (ℝ≥0)

HIGH CORRELATION

Distinct81
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean167442.005
Minimum10000
Maximum1000000
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size234.2 KiB

Quantile statistics

Minimum10000
5-th percentile20000
Q150000
median140000
Q3240000
95-th percentile430000
Maximum1000000
Range990000
Interquartile range (IQR)190000

Descriptive statistics

Standard deviation129760.1352
Coefficient of variation (CV)0.7749556942
Kurtosis0.5375871217
Mean167442.005
Median Absolute Deviation (MAD)90000
Skewness0.9934913272
Sum5017399680
Variance1.683769269 × 1010
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
500003363
 
11.2%
200001975
 
6.6%
300001610
 
5.4%
800001564
 
5.2%
2000001524
 
5.1%
1500001107
 
3.7%
1000001047
 
3.5%
180000993
 
3.3%
360000874
 
2.9%
60000825
 
2.8%
Other values (71)15083
50.3%
ValueCountFrequency (%)
10000493
 
1.6%
160002
 
< 0.1%
200001975
6.6%
300001610
5.4%
40000230
 
0.8%
500003363
11.2%
60000825
 
2.8%
70000731
 
2.4%
800001564
5.2%
90000650
 
2.2%
ValueCountFrequency (%)
10000001
 
< 0.1%
8000002
 
< 0.1%
7800002
 
< 0.1%
7600001
 
< 0.1%
7500004
< 0.1%
7400002
 
< 0.1%
7300002
 
< 0.1%
7200003
 
< 0.1%
7100006
< 0.1%
7000008
< 0.1%

SEX
Categorical

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size234.2 KiB
2
18091 
1
11874 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters29965
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2
2nd row2
3rd row2
4th row2
5th row1

Common Values

ValueCountFrequency (%)
218091
60.4%
111874
39.6%

Length

Histogram of lengths of the category

Pie chart

ValueCountFrequency (%)
218091
60.4%
111874
39.6%

Most occurring characters

ValueCountFrequency (%)
218091
60.4%
111874
39.6%

Most occurring categories

ValueCountFrequency (%)
Decimal Number29965
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
218091
60.4%
111874
39.6%

Most occurring scripts

ValueCountFrequency (%)
Common29965
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
218091
60.4%
111874
39.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII29965
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
218091
60.4%
111874
39.6%

EDUCATION
Categorical

Distinct4
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size234.2 KiB
2
14019 
1
10563 
3
4915 
4
 
468

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters29965
Distinct characters4
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2
2nd row2
3rd row2
4th row2
5th row2

Common Values

ValueCountFrequency (%)
214019
46.8%
110563
35.3%
34915
 
16.4%
4468
 
1.6%

Length

Histogram of lengths of the category

Pie chart

ValueCountFrequency (%)
214019
46.8%
110563
35.3%
34915
 
16.4%
4468
 
1.6%

Most occurring characters

ValueCountFrequency (%)
214019
46.8%
110563
35.3%
34915
 
16.4%
4468
 
1.6%

Most occurring categories

ValueCountFrequency (%)
Decimal Number29965
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
214019
46.8%
110563
35.3%
34915
 
16.4%
4468
 
1.6%

Most occurring scripts

ValueCountFrequency (%)
Common29965
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
214019
46.8%
110563
35.3%
34915
 
16.4%
4468
 
1.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII29965
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
214019
46.8%
110563
35.3%
34915
 
16.4%
4468
 
1.6%

MARRIAGE
Categorical

HIGH CORRELATION

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size234.2 KiB
2
15945 
1
13643 
3
 
377

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters29965
Distinct characters3
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1
2nd row2
3rd row2
4th row1
5th row1

Common Values

ValueCountFrequency (%)
215945
53.2%
113643
45.5%
3377
 
1.3%

Length

Histogram of lengths of the category

Pie chart

ValueCountFrequency (%)
215945
53.2%
113643
45.5%
3377
 
1.3%

Most occurring characters

ValueCountFrequency (%)
215945
53.2%
113643
45.5%
3377
 
1.3%

Most occurring categories

ValueCountFrequency (%)
Decimal Number29965
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
215945
53.2%
113643
45.5%
3377
 
1.3%

Most occurring scripts

ValueCountFrequency (%)
Common29965
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
215945
53.2%
113643
45.5%
3377
 
1.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII29965
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
215945
53.2%
113643
45.5%
3377
 
1.3%

AGE
Real number (ℝ≥0)

HIGH CORRELATION

Distinct56
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean35.4879693
Minimum21
Maximum79
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size234.2 KiB

Quantile statistics

Minimum21
5-th percentile23
Q128
median34
Q341
95-th percentile53
Maximum79
Range58
Interquartile range (IQR)13

Descriptive statistics

Standard deviation9.219459233
Coefficient of variation (CV)0.2597911184
Kurtosis0.04398801494
Mean35.4879693
Median Absolute Deviation (MAD)6
Skewness0.7320560019
Sum1063397
Variance84.99842855
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
291602
 
5.3%
271475
 
4.9%
281406
 
4.7%
301394
 
4.7%
261252
 
4.2%
311213
 
4.0%
251185
 
4.0%
341161
 
3.9%
321157
 
3.9%
331146
 
3.8%
Other values (46)16974
56.6%
ValueCountFrequency (%)
2167
 
0.2%
22560
 
1.9%
23930
3.1%
241126
3.8%
251185
4.0%
261252
4.2%
271475
4.9%
281406
4.7%
291602
5.3%
301394
4.7%
ValueCountFrequency (%)
791
 
< 0.1%
753
 
< 0.1%
741
 
< 0.1%
734
 
< 0.1%
723
 
< 0.1%
713
 
< 0.1%
7010
< 0.1%
6915
0.1%
685
 
< 0.1%
6716
0.1%

Repay_Sept
Real number (ℝ)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
ZEROS

Distinct11
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean-0.01675287836
Minimum-2
Maximum8
Zeros14737
Zeros (%)49.2%
Negative8432
Negative (%)28.1%
Memory size234.2 KiB

Quantile statistics

Minimum-2
5-th percentile-2
Q1-1
median0
Q30
95-th percentile2
Maximum8
Range10
Interquartile range (IQR)1

Descriptive statistics

Standard deviation1.123492034
Coefficient of variation (CV)-67.06262707
Kurtosis2.730038381
Mean-0.01675287836
Median Absolute Deviation (MAD)1
Skewness0.7346064765
Sum-502
Variance1.26223435
MonotonicityNot monotonic
Histogram with fixed size bins (bins=11)
ValueCountFrequency (%)
014737
49.2%
-15682
 
19.0%
13667
 
12.2%
-22750
 
9.2%
22666
 
8.9%
3322
 
1.1%
476
 
0.3%
526
 
0.1%
819
 
0.1%
611
 
< 0.1%
ValueCountFrequency (%)
-22750
 
9.2%
-15682
 
19.0%
014737
49.2%
13667
 
12.2%
22666
 
8.9%
3322
 
1.1%
476
 
0.3%
526
 
0.1%
611
 
< 0.1%
79
 
< 0.1%
ValueCountFrequency (%)
819
 
0.1%
79
 
< 0.1%
611
 
< 0.1%
526
 
0.1%
476
 
0.3%
3322
 
1.1%
22666
 
8.9%
13667
 
12.2%
014737
49.2%
-15682
 
19.0%

Repay_Aug
Real number (ℝ)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
ZEROS

Distinct11
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean-0.1318538295
Minimum-2
Maximum8
Zeros15730
Zeros (%)52.5%
Negative9798
Negative (%)32.7%
Memory size234.2 KiB

Quantile statistics

Minimum-2
5-th percentile-2
Q1-1
median0
Q30
95-th percentile2
Maximum8
Range10
Interquartile range (IQR)1

Descriptive statistics

Standard deviation1.196321699
Coefficient of variation (CV)-9.073090281
Kurtosis1.577608705
Mean-0.1318538295
Median Absolute Deviation (MAD)0
Skewness0.7920704147
Sum-3951
Variance1.431185607
MonotonicityNot monotonic
Histogram with fixed size bins (bins=11)
ValueCountFrequency (%)
015730
52.5%
-16046
 
20.2%
23926
 
13.1%
-23752
 
12.5%
3326
 
1.1%
499
 
0.3%
128
 
0.1%
525
 
0.1%
720
 
0.1%
612
 
< 0.1%
ValueCountFrequency (%)
-23752
 
12.5%
-16046
 
20.2%
015730
52.5%
128
 
0.1%
23926
 
13.1%
3326
 
1.1%
499
 
0.3%
525
 
0.1%
612
 
< 0.1%
720
 
0.1%
ValueCountFrequency (%)
81
 
< 0.1%
720
 
0.1%
612
 
< 0.1%
525
 
0.1%
499
 
0.3%
3326
 
1.1%
23926
 
13.1%
128
 
0.1%
015730
52.5%
-16046
 
20.2%

Repay_July
Real number (ℝ)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
ZEROS

Distinct11
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean-0.1643917904
Minimum-2
Maximum8
Zeros15764
Zeros (%)52.6%
Negative9989
Negative (%)33.3%
Memory size234.2 KiB

Quantile statistics

Minimum-2
5-th percentile-2
Q1-1
median0
Q30
95-th percentile2
Maximum8
Range10
Interquartile range (IQR)1

Descriptive statistics

Standard deviation1.195877509
Coefficient of variation (CV)-7.274557358
Kurtosis2.091665951
Mean-0.1643917904
Median Absolute Deviation (MAD)0
Skewness0.8414639808
Sum-4926
Variance1.430123016
MonotonicityNot monotonic
Histogram with fixed size bins (bins=11)
ValueCountFrequency (%)
015764
52.6%
-15934
 
19.8%
-24055
 
13.5%
23819
 
12.7%
3240
 
0.8%
475
 
0.3%
727
 
0.1%
623
 
0.1%
521
 
0.1%
14
 
< 0.1%
ValueCountFrequency (%)
-24055
 
13.5%
-15934
 
19.8%
015764
52.6%
14
 
< 0.1%
23819
 
12.7%
3240
 
0.8%
475
 
0.3%
521
 
0.1%
623
 
0.1%
727
 
0.1%
ValueCountFrequency (%)
83
 
< 0.1%
727
 
0.1%
623
 
0.1%
521
 
0.1%
475
 
0.3%
3240
 
0.8%
23819
 
12.7%
14
 
< 0.1%
015764
52.6%
-15934
 
19.8%

Repay_June
Real number (ℝ)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
ZEROS

Distinct11
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean-0.2189220758
Minimum-2
Maximum8
Zeros16455
Zeros (%)54.9%
Negative10001
Negative (%)33.4%
Memory size234.2 KiB

Quantile statistics

Minimum-2
5-th percentile-2
Q1-1
median0
Q30
95-th percentile2
Maximum8
Range10
Interquartile range (IQR)1

Descriptive statistics

Standard deviation1.168175186
Coefficient of variation (CV)-5.33603193
Kurtosis3.508962108
Mean-0.2189220758
Median Absolute Deviation (MAD)0
Skewness1.000798562
Sum-6560
Variance1.364633266
MonotonicityNot monotonic
Histogram with fixed size bins (bins=11)
ValueCountFrequency (%)
016455
54.9%
-15683
 
19.0%
-24318
 
14.4%
23159
 
10.5%
3180
 
0.6%
468
 
0.2%
758
 
0.2%
535
 
0.1%
65
 
< 0.1%
82
 
< 0.1%
ValueCountFrequency (%)
-24318
 
14.4%
-15683
 
19.0%
016455
54.9%
12
 
< 0.1%
23159
 
10.5%
3180
 
0.6%
468
 
0.2%
535
 
0.1%
65
 
< 0.1%
758
 
0.2%
ValueCountFrequency (%)
82
 
< 0.1%
758
 
0.2%
65
 
< 0.1%
535
 
0.1%
468
 
0.2%
3180
 
0.6%
23159
 
10.5%
12
 
< 0.1%
016455
54.9%
-15683
 
19.0%

Repay_May
Real number (ℝ)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
ZEROS

Distinct10
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean-0.2645085934
Minimum-2
Maximum8
Zeros16947
Zeros (%)56.6%
Negative10051
Negative (%)33.5%
Memory size234.2 KiB

Quantile statistics

Minimum-2
5-th percentile-2
Q1-1
median0
Q30
95-th percentile2
Maximum8
Range10
Interquartile range (IQR)1

Descriptive statistics

Standard deviation1.132219856
Coefficient of variation (CV)-4.280465302
Kurtosis4.003562263
Mean-0.2645085934
Median Absolute Deviation (MAD)0
Skewness1.009329021
Sum-7926
Variance1.281921802
MonotonicityNot monotonic
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%)
016947
56.6%
-15535
 
18.5%
-24516
 
15.1%
22626
 
8.8%
3178
 
0.6%
483
 
0.3%
758
 
0.2%
517
 
0.1%
64
 
< 0.1%
81
 
< 0.1%
ValueCountFrequency (%)
-24516
 
15.1%
-15535
 
18.5%
016947
56.6%
22626
 
8.8%
3178
 
0.6%
483
 
0.3%
517
 
0.1%
64
 
< 0.1%
758
 
0.2%
81
 
< 0.1%
ValueCountFrequency (%)
81
 
< 0.1%
758
 
0.2%
64
 
< 0.1%
517
 
0.1%
483
 
0.3%
3178
 
0.6%
22626
 
8.8%
016947
56.6%
-15535
 
18.5%
-24516
 
15.1%

Repay_Apr
Real number (ℝ)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
ZEROS

Distinct10
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean-0.2894376773
Minimum-2
Maximum8
Zeros16286
Zeros (%)54.4%
Negative10601
Negative (%)35.4%
Memory size234.2 KiB

Quantile statistics

Minimum-2
5-th percentile-2
Q1-1
median0
Q30
95-th percentile2
Maximum8
Range10
Interquartile range (IQR)1

Descriptive statistics

Standard deviation1.1490901
Coefficient of variation (CV)-3.970077809
Kurtosis3.437256875
Mean-0.2894376773
Median Absolute Deviation (MAD)0
Skewness0.9486089933
Sum-8673
Variance1.320408057
MonotonicityNot monotonic
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%)
016286
54.4%
-15736
 
19.1%
-24865
 
16.2%
22766
 
9.2%
3184
 
0.6%
448
 
0.2%
746
 
0.2%
619
 
0.1%
513
 
< 0.1%
82
 
< 0.1%
ValueCountFrequency (%)
-24865
 
16.2%
-15736
 
19.1%
016286
54.4%
22766
 
9.2%
3184
 
0.6%
448
 
0.2%
513
 
< 0.1%
619
 
0.1%
746
 
0.2%
82
 
< 0.1%
ValueCountFrequency (%)
82
 
< 0.1%
746
 
0.2%
619
 
0.1%
513
 
< 0.1%
448
 
0.2%
3184
 
0.6%
22766
 
9.2%
016286
54.4%
-15736
 
19.1%
-24865
 
16.2%

Bill_Sept
Real number (ℝ)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
ZEROS

Distinct22723
Distinct (%)75.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean51283.00978
Minimum-165580
Maximum964511
Zeros1978
Zeros (%)6.6%
Negative590
Negative (%)2.0%
Memory size234.2 KiB

Quantile statistics

Minimum-165580
5-th percentile0
Q13595
median22438
Q367260
95-th percentile201303.8
Maximum964511
Range1130091
Interquartile range (IQR)63665

Descriptive statistics

Standard deviation73658.1324
Coefficient of variation (CV)1.436306736
Kurtosis9.796846218
Mean51283.00978
Median Absolute Deviation (MAD)21842
Skewness2.662513456
Sum1536695388
Variance5425520469
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
01978
 
6.6%
390243
 
0.8%
78076
 
0.3%
32672
 
0.2%
31663
 
0.2%
250059
 
0.2%
39648
 
0.2%
240039
 
0.1%
41629
 
0.1%
105025
 
0.1%
Other values (22713)27333
91.2%
ValueCountFrequency (%)
-1655801
< 0.1%
-1549731
< 0.1%
-153081
< 0.1%
-143861
< 0.1%
-115451
< 0.1%
-106821
< 0.1%
-98021
< 0.1%
-90951
< 0.1%
-81871
< 0.1%
-74381
< 0.1%
ValueCountFrequency (%)
9645111
< 0.1%
7468141
< 0.1%
6530621
< 0.1%
6304581
< 0.1%
6266481
< 0.1%
6217491
< 0.1%
6138601
< 0.1%
6107231
< 0.1%
6085941
< 0.1%
6040191
< 0.1%

Bill_Aug
Real number (ℝ)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
ZEROS

Distinct22346
Distinct (%)74.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean49236.36629
Minimum-69777
Maximum983931
Zeros2476
Zeros (%)8.3%
Negative669
Negative (%)2.2%
Memory size234.2 KiB

Quantile statistics

Minimum-69777
5-th percentile0
Q13010
median21295
Q364109
95-th percentile194889.6
Maximum983931
Range1053708
Interquartile range (IQR)61099

Descriptive statistics

Standard deviation71195.56739
Coefficient of variation (CV)1.445995567
Kurtosis10.29321199
Mean49236.36629
Median Absolute Deviation (MAD)20905
Skewness2.70386174
Sum1475367716
Variance5068808816
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
02476
 
8.3%
390230
 
0.8%
32675
 
0.3%
78075
 
0.3%
31672
 
0.2%
250051
 
0.2%
39650
 
0.2%
240042
 
0.1%
-20029
 
0.1%
41628
 
0.1%
Other values (22336)26837
89.6%
ValueCountFrequency (%)
-697771
< 0.1%
-675261
< 0.1%
-333501
< 0.1%
-300001
< 0.1%
-262141
< 0.1%
-247041
< 0.1%
-247021
< 0.1%
-229601
< 0.1%
-186181
< 0.1%
-180881
< 0.1%
ValueCountFrequency (%)
9839311
< 0.1%
7439701
< 0.1%
6715631
< 0.1%
6467701
< 0.1%
6244751
< 0.1%
6059431
< 0.1%
5977931
< 0.1%
5868251
< 0.1%
5817751
< 0.1%
5776811
< 0.1%

Bill_July
Real number (ℝ)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
ZEROS

Distinct22026
Distinct (%)73.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean47067.91607
Minimum-157264
Maximum1664089
Zeros2840
Zeros (%)9.5%
Negative655
Negative (%)2.2%
Memory size234.2 KiB

Quantile statistics

Minimum-157264
5-th percentile0
Q12711
median20135
Q360201
95-th percentile187901
Maximum1664089
Range1821353
Interquartile range (IQR)57490

Descriptive statistics

Standard deviation69371.35232
Coefficient of variation (CV)1.473856464
Kurtosis19.77100256
Mean47067.91607
Median Absolute Deviation (MAD)19745
Skewness3.086493832
Sum1410390105
Variance4812384523
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
02840
 
9.5%
390274
 
0.9%
78074
 
0.2%
32663
 
0.2%
31662
 
0.2%
39647
 
0.2%
250040
 
0.1%
240039
 
0.1%
41629
 
0.1%
20027
 
0.1%
Other values (22016)26470
88.3%
ValueCountFrequency (%)
-1572641
< 0.1%
-615061
< 0.1%
-461271
< 0.1%
-340411
< 0.1%
-254431
< 0.1%
-247021
< 0.1%
-203201
< 0.1%
-177061
< 0.1%
-159101
< 0.1%
-156411
< 0.1%
ValueCountFrequency (%)
16640891
< 0.1%
8550861
< 0.1%
6931311
< 0.1%
6896431
< 0.1%
6896271
< 0.1%
6320411
< 0.1%
5974151
< 0.1%
5789711
< 0.1%
5779571
< 0.1%
5770151
< 0.1%

Bill_June
Real number (ℝ)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
ZEROS

Distinct21548
Distinct (%)71.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean43313.32988
Minimum-170000
Maximum891586
Zeros3165
Zeros (%)10.6%
Negative675
Negative (%)2.3%
Memory size234.2 KiB

Quantile statistics

Minimum-170000
5-th percentile0
Q12360
median19081
Q354601
95-th percentile174469.8
Maximum891586
Range1061586
Interquartile range (IQR)52241

Descriptive statistics

Standard deviation64353.51437
Coefficient of variation (CV)1.485766958
Kurtosis11.29858229
Mean43313.32988
Median Absolute Deviation (MAD)18681
Skewness2.820544832
Sum1297883930
Variance4141374812
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
03165
 
10.6%
390245
 
0.8%
780101
 
0.3%
31668
 
0.2%
32662
 
0.2%
39643
 
0.1%
15039
 
0.1%
240039
 
0.1%
250034
 
0.1%
100033
 
0.1%
Other values (21538)26136
87.2%
ValueCountFrequency (%)
-1700001
< 0.1%
-813341
< 0.1%
-651671
< 0.1%
-506161
< 0.1%
-466271
< 0.1%
-345031
< 0.1%
-274901
< 0.1%
-243031
< 0.1%
-221081
< 0.1%
-203201
< 0.1%
ValueCountFrequency (%)
8915861
< 0.1%
7068641
< 0.1%
6286991
< 0.1%
6168361
< 0.1%
5728051
< 0.1%
5690341
< 0.1%
5656691
< 0.1%
5635431
< 0.1%
5480201
< 0.1%
5426531
< 0.1%

Bill_May
Real number (ℝ)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
ZEROS

Distinct21010
Distinct (%)70.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean40358.33439
Minimum-81334
Maximum927171
Zeros3476
Zeros (%)11.6%
Negative655
Negative (%)2.2%
Memory size234.2 KiB

Quantile statistics

Minimum-81334
5-th percentile0
Q11787
median18130
Q350247
95-th percentile165805.6
Maximum927171
Range1008505
Interquartile range (IQR)48460

Descriptive statistics

Standard deviation60817.13062
Coefficient of variation (CV)1.506928657
Kurtosis12.29453891
Mean40358.33439
Median Absolute Deviation (MAD)17714
Skewness2.874925049
Sum1209337490
Variance3698723377
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
03476
 
11.6%
390234
 
0.8%
78094
 
0.3%
31679
 
0.3%
32662
 
0.2%
15058
 
0.2%
39646
 
0.2%
240039
 
0.1%
250037
 
0.1%
41636
 
0.1%
Other values (21000)25804
86.1%
ValueCountFrequency (%)
-813341
< 0.1%
-613721
< 0.1%
-530071
< 0.1%
-466271
< 0.1%
-375941
< 0.1%
-361561
< 0.1%
-304811
< 0.1%
-283351
< 0.1%
-230031
< 0.1%
-207531
< 0.1%
ValueCountFrequency (%)
9271711
< 0.1%
8235401
< 0.1%
5870671
< 0.1%
5517021
< 0.1%
5478801
< 0.1%
5306721
< 0.1%
5243151
< 0.1%
5161391
< 0.1%
5141141
< 0.1%
5082131
< 0.1%

Bill_Apr
Real number (ℝ)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
ZEROS

Distinct20604
Distinct (%)68.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean38917.01228
Minimum-339603
Maximum961664
Zeros3990
Zeros (%)13.3%
Negative688
Negative (%)2.3%
Memory size234.2 KiB

Quantile statistics

Minimum-339603
5-th percentile0
Q11262
median17124
Q349252
95-th percentile161932
Maximum961664
Range1301267
Interquartile range (IQR)47990

Descriptive statistics

Standard deviation59574.14774
Coefficient of variation (CV)1.530799623
Kurtosis12.25912611
Mean38917.01228
Median Absolute Deviation (MAD)16808
Skewness2.845137169
Sum1166148273
Variance3549079079
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
03990
 
13.3%
390206
 
0.7%
78086
 
0.3%
15078
 
0.3%
31677
 
0.3%
32656
 
0.2%
39644
 
0.1%
41636
 
0.1%
-1833
 
0.1%
240032
 
0.1%
Other values (20594)25327
84.5%
ValueCountFrequency (%)
-3396031
< 0.1%
-2090511
< 0.1%
-1509531
< 0.1%
-946251
< 0.1%
-738951
< 0.1%
-570601
< 0.1%
-514431
< 0.1%
-511831
< 0.1%
-466271
< 0.1%
-457341
< 0.1%
ValueCountFrequency (%)
9616641
< 0.1%
6999441
< 0.1%
5686381
< 0.1%
5277111
< 0.1%
5275661
< 0.1%
5149751
< 0.1%
5137981
< 0.1%
5119051
< 0.1%
5013701
< 0.1%
4991001
< 0.1%

Pay_Sept
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
ZEROS

Distinct7943
Distinct (%)26.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean5670.099316
Minimum0
Maximum873552
Zeros5218
Zeros (%)17.4%
Negative0
Negative (%)0.0%
Memory size234.2 KiB

Quantile statistics

Minimum0
5-th percentile0
Q11000
median2102
Q35008
95-th percentile18447.2
Maximum873552
Range873552
Interquartile range (IQR)4008

Descriptive statistics

Standard deviation16571.84947
Coefficient of variation (CV)2.92267358
Kurtosis414.8548633
Mean5670.099316
Median Absolute Deviation (MAD)1929
Skewness14.66159454
Sum169904526
Variance274626194.7
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
05218
 
17.4%
20001363
 
4.5%
3000891
 
3.0%
5000698
 
2.3%
1500507
 
1.7%
4000426
 
1.4%
10000401
 
1.3%
1000365
 
1.2%
2500298
 
1.0%
6000294
 
1.0%
Other values (7933)19504
65.1%
ValueCountFrequency (%)
05218
17.4%
19
 
< 0.1%
214
 
< 0.1%
315
 
0.1%
418
 
0.1%
512
 
< 0.1%
615
 
0.1%
79
 
< 0.1%
88
 
< 0.1%
97
 
< 0.1%
ValueCountFrequency (%)
8735521
< 0.1%
5050001
< 0.1%
4933581
< 0.1%
4239031
< 0.1%
4050161
< 0.1%
3681991
< 0.1%
3230141
< 0.1%
3048151
< 0.1%
3020001
< 0.1%
3000391
< 0.1%

Pay_Aug
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
SKEWED
ZEROS

Distinct7899
Distinct (%)26.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean5927.98318
Minimum0
Maximum1684259
Zeros5365
Zeros (%)17.9%
Negative0
Negative (%)0.0%
Memory size234.2 KiB

Quantile statistics

Minimum0
5-th percentile0
Q1850
median2010
Q35000
95-th percentile19030.8
Maximum1684259
Range1684259
Interquartile range (IQR)4150

Descriptive statistics

Standard deviation23053.45664
Coefficient of variation (CV)3.888920724
Kurtosis1639.924451
Mean5927.98318
Median Absolute Deviation (MAD)1990
Skewness30.43861292
Sum177632016
Variance531461863.3
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
05365
 
17.9%
20001290
 
4.3%
3000857
 
2.9%
5000717
 
2.4%
1000594
 
2.0%
1500521
 
1.7%
4000410
 
1.4%
10000318
 
1.1%
6000283
 
0.9%
2500251
 
0.8%
Other values (7889)19359
64.6%
ValueCountFrequency (%)
05365
17.9%
115
 
0.1%
220
 
0.1%
318
 
0.1%
411
 
< 0.1%
525
 
0.1%
68
 
< 0.1%
712
 
< 0.1%
89
 
< 0.1%
96
 
< 0.1%
ValueCountFrequency (%)
16842591
< 0.1%
12270821
< 0.1%
12154711
< 0.1%
10245161
< 0.1%
5804641
< 0.1%
4155521
< 0.1%
4010031
< 0.1%
3881261
< 0.1%
3852281
< 0.1%
3849861
< 0.1%

Pay_July
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
ZEROS

Distinct7518
Distinct (%)25.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean5231.688837
Minimum0
Maximum896040
Zeros5937
Zeros (%)19.8%
Negative0
Negative (%)0.0%
Memory size234.2 KiB

Quantile statistics

Minimum0
5-th percentile0
Q1390
median1804
Q34512
95-th percentile17602.6
Maximum896040
Range896040
Interquartile range (IQR)4122

Descriptive statistics

Standard deviation17616.36112
Coefficient of variation (CV)3.367241759
Kurtosis563.7392771
Mean5231.688837
Median Absolute Deviation (MAD)1796
Skewness17.2081766
Sum156767556
Variance310336179.3
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
05937
 
19.8%
20001285
 
4.3%
10001103
 
3.7%
3000870
 
2.9%
5000721
 
2.4%
1500490
 
1.6%
4000381
 
1.3%
10000312
 
1.0%
1200243
 
0.8%
6000241
 
0.8%
Other values (7508)18382
61.3%
ValueCountFrequency (%)
05937
19.8%
113
 
< 0.1%
219
 
0.1%
314
 
< 0.1%
415
 
0.1%
518
 
0.1%
614
 
< 0.1%
718
 
0.1%
810
 
< 0.1%
912
 
< 0.1%
ValueCountFrequency (%)
8960401
< 0.1%
8890431
< 0.1%
5082291
< 0.1%
4175881
< 0.1%
4009721
< 0.1%
3970921
< 0.1%
3804781
< 0.1%
3717181
< 0.1%
3493951
< 0.1%
3442611
< 0.1%

Pay_June
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
ZEROS

Distinct6937
Distinct (%)23.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean4831.617454
Minimum0
Maximum621000
Zeros6377
Zeros (%)21.3%
Negative0
Negative (%)0.0%
Memory size234.2 KiB

Quantile statistics

Minimum0
5-th percentile0
Q1300
median1500
Q34016
95-th percentile16037
Maximum621000
Range621000
Interquartile range (IQR)3716

Descriptive statistics

Standard deviation15674.46454
Coefficient of variation (CV)3.244144365
Kurtosis277.0486932
Mean4831.617454
Median Absolute Deviation (MAD)1500
Skewness12.89850649
Sum144779417
Variance245688838.5
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
06377
 
21.3%
10001394
 
4.7%
20001214
 
4.1%
3000887
 
3.0%
5000810
 
2.7%
1500441
 
1.5%
4000402
 
1.3%
10000341
 
1.1%
2500259
 
0.9%
500258
 
0.9%
Other values (6927)17582
58.7%
ValueCountFrequency (%)
06377
21.3%
122
 
0.1%
222
 
0.1%
313
 
< 0.1%
420
 
0.1%
512
 
< 0.1%
616
 
0.1%
711
 
< 0.1%
87
 
< 0.1%
99
 
< 0.1%
ValueCountFrequency (%)
6210001
< 0.1%
5288971
< 0.1%
4970001
< 0.1%
4321301
< 0.1%
4000461
< 0.1%
3317881
< 0.1%
3309821
< 0.1%
3200081
< 0.1%
3130941
< 0.1%
2929621
< 0.1%

Pay_May
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
ZEROS

Distinct6897
Distinct (%)23.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean4804.897047
Minimum0
Maximum426529
Zeros6672
Zeros (%)22.3%
Negative0
Negative (%)0.0%
Memory size234.2 KiB

Quantile statistics

Minimum0
5-th percentile0
Q1261
median1500
Q34042
95-th percentile16000
Maximum426529
Range426529
Interquartile range (IQR)3781

Descriptive statistics

Standard deviation15286.3723
Coefficient of variation (CV)3.181415158
Kurtosis179.8752095
Mean4804.897047
Median Absolute Deviation (MAD)1500
Skewness11.12174174
Sum143978740
Variance233673178
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
06672
 
22.3%
10001340
 
4.5%
20001323
 
4.4%
3000947
 
3.2%
5000814
 
2.7%
1500426
 
1.4%
4000401
 
1.3%
10000343
 
1.1%
500250
 
0.8%
6000247
 
0.8%
Other values (6887)17202
57.4%
ValueCountFrequency (%)
06672
22.3%
121
 
0.1%
213
 
< 0.1%
313
 
< 0.1%
412
 
< 0.1%
59
 
< 0.1%
67
 
< 0.1%
79
 
< 0.1%
86
 
< 0.1%
96
 
< 0.1%
ValueCountFrequency (%)
4265291
< 0.1%
4179901
< 0.1%
3880711
< 0.1%
3792671
< 0.1%
3320001
< 0.1%
3317881
< 0.1%
3309821
< 0.1%
3268891
< 0.1%
3170771
< 0.1%
3101351
< 0.1%

Pay_Apr
Real number (ℝ≥0)

HIGH CORRELATION
ZEROS

Distinct6939
Distinct (%)23.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean5221.498014
Minimum0
Maximum528666
Zeros7142
Zeros (%)23.8%
Negative0
Negative (%)0.0%
Memory size234.2 KiB

Quantile statistics

Minimum0
5-th percentile0
Q1131
median1500
Q34000
95-th percentile17384.4
Maximum528666
Range528666
Interquartile range (IQR)3869

Descriptive statistics

Standard deviation17786.97686
Coefficient of variation (CV)3.406489252
Kurtosis166.9817897
Mean5221.498014
Median Absolute Deviation (MAD)1500
Skewness10.63509397
Sum156462188
Variance316376546
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
07142
23.8%
10001299
 
4.3%
20001295
 
4.3%
3000914
 
3.1%
5000808
 
2.7%
1500439
 
1.5%
4000411
 
1.4%
10000356
 
1.2%
500247
 
0.8%
6000220
 
0.7%
Other values (6929)16834
56.2%
ValueCountFrequency (%)
07142
23.8%
120
 
0.1%
29
 
< 0.1%
314
 
< 0.1%
412
 
< 0.1%
57
 
< 0.1%
66
 
< 0.1%
75
 
< 0.1%
86
 
< 0.1%
97
 
< 0.1%
ValueCountFrequency (%)
5286661
< 0.1%
5271431
< 0.1%
4430011
< 0.1%
4220001
< 0.1%
4035001
< 0.1%
3770001
< 0.1%
3724951
< 0.1%
3512821
< 0.1%
3452931
< 0.1%
3080001
< 0.1%

Default
Categorical

HIGH CORRELATION

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size234.2 KiB
0
23335 
1
6630 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters29965
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1
2nd row1
3rd row0
4th row0
5th row0

Common Values

ValueCountFrequency (%)
023335
77.9%
16630
 
22.1%

Length

Histogram of lengths of the category

Pie chart

ValueCountFrequency (%)
023335
77.9%
16630
 
22.1%

Most occurring characters

ValueCountFrequency (%)
023335
77.9%
16630
 
22.1%

Most occurring categories

ValueCountFrequency (%)
Decimal Number29965
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
023335
77.9%
16630
 
22.1%

Most occurring scripts

ValueCountFrequency (%)
Common29965
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
023335
77.9%
16630
 
22.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII29965
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
023335
77.9%
16630
 
22.1%

Interactions

Correlations

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

A simple visualization of nullity by column.
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

First rows

LIMIT_BALSEXEDUCATIONMARRIAGEAGERepay_SeptRepay_AugRepay_JulyRepay_JuneRepay_MayRepay_AprBill_SeptBill_AugBill_JulyBill_JuneBill_MayBill_AprPay_SeptPay_AugPay_JulyPay_JunePay_MayPay_AprDefault
0200002212422-1-1-2-239133102689000068900001
112000022226-1200022682172526823272345532610100010001000020001
290000222340000002923914027135591433114948155491518150010001000100050000
350000221370000004699048233492912831428959295472000201912001100106910000
45000012157-10-100086175670358352094019146191312000366811000090006896790
5500001123700000064400570695760819394196192002425001815657100010008000
6500000112290000003679654120234450075426534830034739445500040000380002023913750137700
7100000222230-1-100-111876380601221-1595673806010581168715420
81400002312800200011285140961210812211117933719332904321000100010000
92000013235-2-2-2-2-1-10000130071391200013007112200

Last rows

LIMIT_BALSEXEDUCATIONMARRIAGEAGERepay_SeptRepay_AugRepay_JulyRepay_JuneRepay_MayRepay_AprBill_SeptBill_AugBill_JulyBill_JuneBill_MayBill_AprPay_SeptPay_AugPay_JulyPay_JunePay_MayPay_AprDefault
299551400001214100000013832513714213911013826249675461216000700042281505200020000
29956210000121343222222500250025002500250025000000001
299571000013143000-2-2-288021040000002000000000
29958100000112380-1-100030421427102996706266947355004200011178440003000200020000
2995980000122342222227255777708793847751982607811587000350007000040001
299602200001313900000018894819281520836588004312371598085002000050033047500010000
2996115000013243-1-1-1-100168318283502897951900183735268998129000
299623000012237432-10035653356275820878205821935700220004200200031001
2996380000131411-1000-1-16457837976304527741185548944859003409117819265296418041
2996450000121460000004792948905497643653532428153132078180014301000100010001